Web Data Extraction Based on Ensemble Learning
نویسندگان
چکیده
منابع مشابه
Toward an Ontology-based Web Data Extraction
Many web sites provide regularly updated data in a fixed structure. These data are very useful for some applications with autonomous agents (e.g. to determine the exchange rates). However, data extraction from these sites is nontrivial because of the great variations from one site to another. In this paper, we propose an approach based on ontology, which facilitates the formalization and the ex...
متن کاملGraph Grammar Based Web Data Extraction
Web data extraction becomes a hot topic after the invention of World Wide Web, because the large amount of information on the Web makes it challenging to retrieve useful information. Due to the diverse designs and presentations of information on different Web sites, it is hard to implement a general solution to extract data across different Web sites. This paper presents a novel method based on...
متن کاملVision-based Web Data Records Extraction
This paper studies the problem of extracting data records on the response pages returned from web databases or search engines. Existing solutions to this problem are based primarily on analyzing the HTML DOM trees and tags of the response pages. While these solutions can achieve good results, they are too heavily dependent on the specifics of HTML and they may have to be changed should the resp...
متن کاملWeb-based closed-domain data extraction on online advertisements
Taking advantage of the popularity of the web, online marketplaces such as Ebay (.com), advertisements (ads for short) websites such as Craigslist(.org), and commercial websites such as Carmax(.com) (allow users to) post ads on a variety of products and services. Instead of browsing through numerous websites to locate ads of interest, web users would benefit from the existence of a single, full...
متن کاملDeep Web Data Extraction Based on URL and Domain Classification
1 ISACA JOURNAL VOLUME 4, 2015 The rapid development of computer and networking technologies has increased the popularity of the web, which has led to the presence of more and more information on the web. However, the explosive increase of information online leads to some search problems—specifically search engines usually return too many unrelated results on a given query. Deep web is content ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Database Theory and Application
سال: 2015
ISSN: 2005-4270,2005-4270
DOI: 10.14257/ijdta.2015.8.3.27